Probabilistic Representations for Integrating Unreliable Data Sources
نویسندگان
چکیده
Databases constructed automatically through web mining and information extraction often overlap with databases constructed and curated by hand. These two types of databases are complementary: automatic extraction provides increased scope, while curated databases provide increased accuracy. The uncertain nature of such integration tasks suggests that the final representation of the merged database should represent multiple possible values. We present initial work on a system to integrate two bibliographic databases, DBLP and Rexa, while maintaining and assigning probabilistic confidences to different alternative values
منابع مشابه
A Probabilistic Model of Learning Fields in Islamic Economics and Finance
In this paper an epistemological model of learning fields of probabilistic events is formalized. It is used to explain resource allocation governed by pervasive complementarities as the sign of unity of knowledge. Such an episteme is induced epistemologically into interacting, integrating and evolutionary variables representing the problem at hand. The end result is the formalization of a p...
متن کاملRepresenting Meaning with a Combination of Logical and Distributional Models
NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based ap...
متن کاملUncertain Groupings: Probabilistic Combination of Grouping Data
Probabilistic approaches for data integration have much potential [7]. We view data integration as an iterative process where data understanding gradually increases as the data scientist continuously refines his view on how to deal with learned intricacies like data conflicts. This paper presents a probabilistic approach for integrating data on groupings. We focus on a bio-informatics use case ...
متن کاملAnswering Queries from Statistics and Probabilistic Views
Systems integrating dozens of databases, in the scientific domain or in a large corporation, need to cope with a wide variety of imprecisions, such as: different representations of the same object in different sources; imperfect and noisy schema alignments; contradictory information across sources; constraint violations; or insufficient evidence to answer a given query. If standard query semant...
متن کاملLearning from Collective Intelligence in Groups
Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a variety of applications. In order to address this issue, we propose a probabilistic model to jointly assess the reliability of sources and find the...
متن کامل